Thrust : A Productivity - Oriented Library for CUDA 26
نویسنده
چکیده
This chapter demonstrates how to leverage the Thrust parallel template library to implement high-performance applications with minimal programming effort. Based on the C++ Standard TemplateLibrary (STL), Thrust brings a familiar high-level interface to the realm of GPU Computing whileremaining fully interoperable with the rest of the CUDA software ecosystem. Applications writtenwith Thrust are concise, readable, and efficient.
منابع مشابه
Parallelization of Weighted Sequence Comparision by Using Ebwt
In this paper, we describe the design of high-performance extended burrow wheeler transform based weighted sequence comparison algorithm for many core GPUs taking advantages of the full programmability offered by compute unified device architecture (CUDA) and its standard library thrust. Our extended burrow wheeler transform based weighted sequence comparison algorithm with thrust library imple...
متن کاملHydra: a C++11 framework for data analysis in massively parallel platforms
Hydra is a header-only, templated and C++11-compliant framework designed to perform the typical bottleneck calculations found in common HEP data analyses on massively parallel platforms. The framework is implemented on top of the C++11 Standard Library and a variadic version of the Thrust library and is designed to run on Linux systems, using OpenMP, CUDA and TBB enabled devices. This contribut...
متن کاملGPU Acceleration for the C++ Standard Template Library
Modern programmers must exploit parallelism for performance gains, possibly through the use of an attached or on-chip GPU. To take advantage of the GPU in C++ programs, the programmer must use either a new language (CUDA or OpenCL) or an external library (Thrust). Rather than requiring that programmers learn new tools, modify existing code, and change software development practices, the C++ Sta...
متن کاملAlgorithmic Improvements for Portable Event-Based Monte Carlo Transport Using the Nvidia Thrust Library
High performance computing environments are progressively moving towards many-core computing architectures. The Los Alamos National Laboratory Trinity machine, available in late 2016, will use both Intel Xeon Haswell processors and Intel Xeon Phi Knights Landing many integrated core (MIC) coprocessors. The Lawrence Livermore National Laboratory Sierra machine, available in 2018, will use an IBM...
متن کاملNear Real-time Pointcloud Smoothing for 3D Imaging Systems
In this project a GPU-based implementation of Moving Least Squares is presented for smoothing 3D pointclouds. We used an Xbox Kinect to generate spatial data and coded our algorithm using CUDA with the Thrust library. Our implementation uses an organized set of points and can be computed at ~7 Hz on a Nvidia Quadro FX 4800. While perhaps not directly comparable, it has a speedup of between 30-6...
متن کامل